NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BioTrove: A Large Curated Image Dataset Enabling AI for Biodiversity

Yang, Chih-Hsuan; Feuer, Ben; Jubery, Zaki; Deng, Zi K; Nakkab, Andre; Hasan, Md-Zahid; Chiranjeevi, Shivani; Marshall, Kelly; Baishnab, Nirmal; Singh, Asheesh K; et al (March 2025, IEEE/CVF Winter Conference on Applications of Computer Vision)

Free, publicly-accessible full text available March 31, 2026
Persistent monitoring of insect-pests on sticky traps through hierarchical transfer learning and slicing-aided hyper inference

https://doi.org/10.3389/fpls.2024.1484587

Fotouhi, Fateme; Menke, Kevin; Prestholt, Aaron; Gupta, Ashish; Carroll, Matthew E; Yang, Hsin-Jung; Skidmore, Edwin J; O’Neal, Matthew; Merchant, Nirav; Das, Sajal K; et al (November 2024, Frontiers in Plant Science)

IntroductionEffective monitoring of insect-pests is vital for safeguarding agricultural yields and ensuring food security. Recent advances in computer vision and machine learning have opened up significant possibilities of automated persistent monitoring of insect-pests through reliable detection and counting of insects in setups such as yellow sticky traps. However, this task is fraught with complexities, encompassing challenges such as, laborious dataset annotation, recognizing small insect-pests in low-resolution or distant images, and the intricate variations across insect-pests life stages and species classes. MethodsTo tackle these obstacles, this work investigates combining two solutions, Hierarchical Transfer Learning (HTL) and Slicing-Aided Hyper Inference (SAHI), along with applying a detection model. HTL pioneers a multi-step knowledge transfer paradigm, harnessing intermediary in-domain datasets to facilitate model adaptation. Moreover, slicing-aided hyper inference subdivides images into overlapping patches, conducting independent object detection on each patch before merging outcomes for precise, comprehensive results. ResultsThe outcomes underscore the substantial improvement achievable in detection results by integrating a diverse and expansive in-domain dataset within the HTL method, complemented by the utilization of SAHI. DiscussionWe also present a hardware and software infrastructure for deploying such models for real-life applications. Our results can assist researchers and practitioners looking for solutions for insect-pest detection and quantification on yellow sticky traps.
more » « less
Full Text Available
InsectNet: Real-time identification of insects using an end-to-end machine learning pipeline

https://doi.org/10.1093/pnasnexus/pgae575

Chiranjeevi, Shivani; Saadati, Mojdeh; Deng, Zi_K; Koushik, Jayanth; Jubery, Talukder_Z; Mueller, Daren_S; O’Neal, Matthew; Merchant, Nirav; Singh, Aarti; Singh, Asheesh_K; et al (December 2024, PNAS Nexus)

Abstract Insect pests significantly impact global agricultural productivity and crop quality. Effective integrated pest management strategies require the identification of insects, including beneficial and harmful insects. Automated identification of insects under real-world conditions presents several challenges, including the need to handle intraspecies dissimilarity and interspecies similarity, life-cycle stages, camouflage, diverse imaging conditions, and variability in insect orientation. An end-to-end approach for training deep-learning models, InsectNet, is proposed to address these challenges. Our approach has the following key features: (i) uses a large dataset of insect images collected through citizen science along with label-free self-supervised learning to train a global model, (ii) fine-tuning this global model using smaller, expert-verified regional datasets to create a local insect identification model, (iii) which provides high prediction accuracy even for species with small sample sizes, (iv) is designed to enhance model trustworthiness, and (v) democratizes access through streamlined machine learning operations. This global-to-local model strategy offers a more scalable and economically viable solution for implementing advanced insect identification systems across diverse agricultural ecosystems. We report accurate identification (>96% accuracy) of numerous agriculturally and ecologically relevant insect species, including pollinators, parasitoids, predators, and harmful insects. InsectNet provides fine-grained insect species identification, works effectively in challenging backgrounds, and avoids making predictions when uncertain, increasing its utility and trustworthiness. The model and associated workflows are available through a web-based portal accessible through a computer or mobile device. We envision InsectNet to complement existing approaches, and be part of a growing suite of AI technologies for addressing agricultural challenges.
more » « less
MDRepo—an open data warehouse for community-contributed molecular dynamics simulations of proteins

https://doi.org/10.1093/nar/gkae1109

Roy, Amitava; Ward, Ethan; Choi, Illyoung; Cosi, Michele; Edgin, Tony; Hughes, Travis_S; Islam, Md_Shafayet; Khan, Asif_M; Kolekar, Aakash; Rayl, Mariah; et al (November 2024, Nucleic Acids Research)

Abstract Molecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. Ideally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. Here, we introduce MDRepo, a robust infrastructure that provides a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyber-infrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data.
more » « less
Cyber-agricultural systems for crop breeding and sustainable production

https://doi.org/10.1016/j.tplants.2023.08.001

Sarkar, Soumik; Ganapathysubramanian, Baskar; Singh, Arti; Fotouhi, Fateme; Kar, Soumyashree; Nagasubramanian, Koushik; Chowdhary, Girish; Das, Sajal K.; Kantor, George; Krishnamurthy, Adarsh; et al (February 2024, Trends in Plant Science)

Full Text Available
MDRepo – an open environment for data warehousing and knowledge discovery from molecular dynamics simulations

https://doi.org/10.1101/2024.07.11.602903

Roy, Amitava; Ward, Ethan; Choi, Illyoung; Cosi, Michele; Edgin, Tony; Hughes, Travis S; Islam, Md Shafayet; Khan, Asif M; Kolekar, Aakash; Rayl, Mariah; et al (July 2024, bioRxiv)

BackgroundMolecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. This enables advances in drug discovery and the design of therapeutic interventions. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. A needIdeally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. A solutionHere, we introduce MDRepo, a robust infrastructure that supports a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyberinfrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data.
more » « less
Full Text Available
Cloud Computing for Research and Education Gets a Sweet Upgrade with CACAO

https://doi.org/10.1145/3569951.3597555

Skidmore, Edwin; Cosi, Michele; Swetnam, Tyson; Merchant, Nirav; Xu, Zhouyun; Choi, Illyoung; Davey, Sean; Frady, Jeremy; Wall, Mariah; Yung, Michelle (July 2023, ACM)
PhytoOracle: Scalable, modular phenomics data processing pipelines

https://doi.org/10.3389/fpls.2023.1112973

Gonzalez, Emmanuel M.; Zarei, Ariyan; Hendler, Nathanial; Simmons, Travis; Zarei, Arman; Demieville, Jeffrey; Strand, Robert; Rozzi, Bruno; Calleja, Sebastian; Ellingson, Holly; et al (March 2023, Frontiers in Plant Science)

As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll fluorescence 2D images, and 3D point clouds. PhytoOracle aims to ( i ) improve data processing efficiency; ( ii ) provide an extensible, reproducible computing framework; and ( iii ) enable data fusion of multi-modal phenomics data. PhytoOracle integrates open-source distributed computing frameworks for parallel processing on high-performance computing, cloud, and local computing environments. Each pipeline component is available as a standalone container, providing transferability, extensibility, and reproducibility. The PO pipeline extracts and associates individual plant traits across sensor modalities and collection time points, representing a unique multi-system approach to addressing the genotype-phenotype gap. To date, PO supports lettuce and sorghum phenotypic trait extraction, with a goal of widening the range of supported species in the future. At the maximum number of cores tested in this study (1,024 cores), PO processing times were: 235 minutes for 9,270 RGB images (140.7 GB), 235 minutes for 9,270 thermal images (5.4 GB), and 13 minutes for 39,678 PSII images (86.2 GB). These processing times represent end-to-end processing, from raw data to fully processed numerical phenotypic trait data. Repeatability values of 0.39-0.95 (bounding area), 0.81-0.95 (axis-aligned bounding volume), 0.79-0.94 (oriented bounding volume), 0.83-0.95 (plant height), and 0.81-0.95 (number of points) were observed in Field Scanalyzer data. We also show the ability of PO to process drone data with a repeatability of 0.55-0.95 (bounding area).
more » « less
Full Text Available
A local platform for user-friendly FAIR data management and reproducible analytics

https://doi.org/10.1016/j.jbiotec.2021.08.004

Wieser, Florian; Stryeck, Sarah; Lang, Konrad; Hahn, Christoph; Thallinger, Gerhard G.; Feichtinger, Julia; Hack, Philipp; Stepponat, Manfred; Merchant, Nirav; Lindstaedt, Stefanie; et al (November 2021, Journal of Biotechnology)

Full Text Available
StarBLAST: a scalable BLAST+ solution for the classroom

https://doi.org/10.21105/jose.00102

Cosi, Michele; Forstedt, J.j.; Gonzalez, Emmanuel; Xu, Zhuoyun; Peri, Sateesh; Tuteja, Reetu; Blumberg, Kai; Campbell, Tanner; Merchant, Nirav; Lyons, Eric (April 2021, Journal of Open Source Education)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records